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A Maximum Entropy solution of the 
Covariance Extension Problem for Reciprocal 

Processes 

Francesca Carli, Augusto Ferrante, Michele Pavon, and Giorgio Picci 

Abstract 

Stationary reciprocal processes defined on a finite interval of the integer line can be seen as a 
special class of Markov random fields restricted to one dimension. Non stationary reciprocal processes 
have been extensively studied in the past especially by Jamison, Krener, Levy and co-workers. The 
speciaUzation of the non- stationary theory to the stationary case, however, does not seem to have been 
pursued in sufficient depth in the literature. Stationary reciprocal processes (and reciprocal stochastic 
models) are potentially useful for describing signals which naturally live in a finite region of the time 
(or space) line. Estimation or identification of these models starting from observed data seems still to be 
an open problem which can lead to many interesting applications in signal and image processing. In this 
paper, we discuss a class of reciprocal processes which is the acausal analog of auto-regressive (AR) 
processes , familiar in control and signal processing. We show that maximum UkeUhood identification 
of these processes leads to a covariance extension problem for block-circulant covariance matrices. This 
generaUzes the famous covariance band extension problem for stationary processes on the integer line. 
As in the usual stationary setting on the integer Une, the covariance extension problem turns out to be 
a basic conceptual and practical step in solving the identification problem. We show that the maximum 
entropy principle leads to a complete solution of the problem. 
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I. Introduction 

Reciprocal processes have been introduced at the beginning of the last century [|35l . [El, ll36ll 
even earlier than the idea of Markov process was formalized by Kolmogorov. The basic defining 
property is conditional independence given the values taken by the process at the boundary, which 
resembles a widely accepted definition of Markov random fields. When the "time" parameter 
is one dimensional, reciprocal processes can in fact be seen as Markov random fields restricted 
to one dimension. For this reason, reciprocal processes are actually more general than Markov 
processes (a Markov process is reciprocal but not conversely). In fact, these processes naturally 
live in a finite region of the time (or space) variable and specification of boundary values at the 
extremes of the interval is an essential part of their probabilistic description. In discrete-time 
they are naturally defined on a finite interval of the integer 1 ine. Reciprocal processes have been 
extensively studied in the past notably by Jamison, Krener, Levy and co-workers, see [fT9ll , [|20l , 
ll2T]| . Il23]l . Il22]| . ETll . Eell . ifTSll . However the specialization of the non-stationary theory to the 
stationary case, except for a few noticeable exceptions, e.g. [fT9l . Il33l . [|34ll . does not seem to have 
been pursued in sufficient depth in the literature. Stationary reciprocal processes (and reciprocal 
stochastic models) are potentially useful for describing signals which naturally live in a finite 
region of the time or space line. They can be described by constant coefficient models which 
are a natural generalization of the Gauss-Markov state space models widely used in engineering 
and applied sciences. Estimation and identification of these models starting from observed data 
seems to be a completely open problem which can lead to many interesting apphcations in signal 
and image processing. 

In this paper, after a general introduction to stationary processes defined on a finite interval 
(Section [n]), we discuss a class of reciprocal processes described by models which are the acausal 
analog of auto-regressive (AR) processes, familiar in control and signal processing (Section 



III). In section IV we show that maximum likelihood identification of these processes leads 



to a covariance extension problem for block-circulant covariance matrices. This generalizes the 
famous covariance extension problem for stationary processes on the integer line. As in the usual 
stationary setting on the integer line, the covariance extension problem turns out to be a basic 
conceptual and practical step in solving the identification problem. The circulant covariance 
extension problem looks similar to a classical extension problems for positive block-Toeplitz 
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matrices widely studied in the literature, [fT3l . IfTTll . which belongs to the class of band extension 
problems for positive matrices. All problems of this kind are solvable by factorization techniques. 
However the banded algebra framework on which this literature relies does not apply to circulant 
matrices, see [5 1. Circulant band extension appears to be a new kind of matrix extension problem. 

In the present context, we are seeking a (reciprocal) AR extension. One may speculate that this 
extension should possess the analog of the so-called "maximum entropy" property, which holds 
for stationary processes on the line. In the literature, this property is usually presented as a final 
embellishment of the solution which is obtained by factorization techniques (typically computed 
via the Levinson-Whittle algorithm [[24|. [|40ll ). In our case, where there are no factorization 
techniques at hand, we resort to maximum entropy as the main tool at our disposal to attack 
the problem. In Sections |V] and VI we show that the maximum entropy principle indeed leads 



to a complete solution of the problem. Finally in Section VII we discuss the relation with the 
covariance selection results in Dempster's paper [fTTll . 

Band extension problems for block-circulant matrices of the type discussed in this paper occur 
in particular in applications to image modeling and simulation. For reasons of space, we do not 
provide details but rather refer the reader to the literature, see e.g. [|6l, [|71 and ||32l . 

II. Stationary processes on a finite interval 

In this paper, we work in the wide-sense setting of second-order, zero-mean random variables. 
For the benefit of the reader, we recall here that a second order random vector (or more generally 
process) is just an equivalence class consisting of all zero-mean random vectors (or processes), 
each defined on some canonical probability space, say the space of their sample values, that 
have the same covariance matrix, see e.g. [29^ Chap. X ]. Hence, each second order random 
vector contains in particular a Gaussian element which may be taken as the representative of the 
equivalence class, [[T2l p. 74]. All statements of this paper do therefore apply to the particular 
case of Gaussian distributions. In our setting, however, explicit assumptions of Gaussianness will 
not be needed. We also recall that there is a basic correspondence, established by Kolmogorov 
in the early 1940's, between probabilistic concepts depending only on second order m oments 
and geometric operations on certain subspaces of the Hilbert space of finite variance random 
variables, see e.g. [fT2l p. 636-637] for historical remarks on this. We assume henceforth that the 
reader is familiar with this correspondence. 
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Orthogonality of two random vectors will be understood as componentwise uncorrelation, i.e. 
X ± y means Exy^ = 0. The symbol E [ ■ | ■] denotes orthogonal projection (conditional 
expectation in the Gaussian case) onto the subspace spanned by a family of finite variance 
random variables listed in the second argument. 

A m-dimensional stochastic process on a finite interval [1, A^], is just an ordered collection 
of (zero-mean) random m-vectors y := {y(A;), k = 1,2,...,A^} which will be written as a 
column vector with A^, m-dimensional components. We say that y is stationary if the covariances 
Ey(A;)y(j)^ depend only on the difference of the arguments, namely 

Ey(fc)y(jy = Efc_,-, k,j = l,...,N, 

in which case the covariance matrix of y has a symmetric block-Toeplitz structure; i.e{^ 

So ^7 • • • ^ ' 



Eyy' 



Si So S^ 



(1) 



J-'N-l ■ ■ ■ Si So 

Processes y which have a positive definite covariance Sat are called of full rank (or minimal). 
In this paper, we shall usually deal with full rank processes. 

Definition 2.1: A block-circulant matrix with A^ blocks, is a finite block-Toeplitz matrix 
whose block-columns (or equivalently, block-rows) are shifted cyclically. 

It looks like 



Co 
Ci 



Cn-1 
Co 



Ci 



C 



N~l 



c 



N~l 



c. 



N-2 



■• Cn-1 
Ci Co 



where Ci, G 



\ A block-circulant matrix Cat is fully specified by its first block-column (or 



row). It will be denoted by 



Circ{Co,Ci,...,C^_i}. 



(2) 



'Boldface capitals, e.g. Ijv, Sat, etc. denote block matrices made of A'^ blocks, each of dimension m x m. 
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For an introduction to circulant matrices, we refer the reader to the monograph ||8l. Block- 
circulant matrices of a fixed size form a real vector space which is actually an algebra with 
respect to the usual operations of sum and matrix multiplication. The invertible elements of this 
algebra form a group. 

Consider now a stationary process y on the integer line Z, which is periodic of period T, i.e. 
a process satisfying y{k + nT) := y{k) (almost surely) for all n E Z. We can think of y as 
a process indexed on the discrete circle group, Zt = {1, 2, ... , T} with arithmetics mod T |^ 
Clearly, its covariance function S must also be periodic of period T, namely, Sfe+r = for 
all k E Z. Hence, we may also see the covariance sequence as a function on the isomorphic 
discrete group Zt = {0,T— 1} with arithmetics mod T. But more must be true. 

Proposition 2.1: A (second order) stochastic process y on [1, T] is the restriction to the 
interval [1, T] of a wide-sense stationary periodic process y of period T defined on Z, if and 
only if its covariance matrix is symmetric block-circulant. 

Proof: (only if) Let A; G [1, T]. By assumption there is an m-dimensional stationary process 
y on the integer line Z, which is periodic of period T, satisfying y(A; + nT) := y(A;) (almost 
surely) for arbitrary n G Z. By wide-sense stationarity, the covariance function of y must depend 
only on the difference of the arguments, namely 

tkj := E y(fc)y(jy = tk-j , /c, J = 1, . . . , T. 

Moreover, it is a well-known fact that, for any wide-sense stationary process the following 
symmetry relation holds 

= Sj VrGZ, (3) 

that is the covariance matrix of y has a symmetric block-Toeplitz structure. Now since y is 
periodic of period T, its covariance function must also be periodic of period T; i.e. Sfc+nT = 
for arbitrary k,n E Z. Assume, just to fix the ideas, that T is an even number and consider the 
midpoint k = ^ of the interval [1, T]. The periodicity combined with the symmetry property 
^ yields that 

St ,^ = St ,^ ^ = T = VrGZ (4) 

^Whence T + t — t so that T plays the role of the zero element. 
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and since Q holds for r = 0, 1, . . . , |^ — 1, we can say that the function S must be symmetric 



with respect to the midpoint r = ^ of the interval. Hence, we can conclude that the covariance 
matrix of the process y restricted to [1, T]; that is the covariance of y, is a symmetric 
block-circulant matrix, i.e. it must have the following structure 



Si So 



s7 



Si 



So 



which we write 



s7 



St = Circ{So, ^1 



S. 



Si 



s7 

So 



St. 

2- 



St, • • • , Si} 



Similarly, if T is odd, it must hold that St+i 
be written as 

ST = Circ{So, S^, Sj,.. 



S^ 



T-l 



T-l 
2 



0,1,..., 2 



(5) 

1 and St can 



' T-l 
2 



S T-l . 
2 



St, • • • , Si} 



which proves the first part of the statement. 

(if) We want to prove that if y is a process defined on a finite interval [1, T] with a symmetric 
block-circulant covariance matrix St, then it admits a wide-sense stationary periodic extension, 
y, defined on Z of period T. 

Let y be the process obained by periodically extending the process y to the whole interger 
line Z by setting y{k + nT) := y(A;) for arbitrary n E Z and let us denote by S its (infinite) 
covariance matrix. Since S is a covariance matrix, it must be positive semidefinite. What we 
need to show is that it is a symmetric block- Toeplitz matrix. By definition, S is the covariance 
matrix of the infinite column vector formed by stacking y(0), y(l), . . . , y(T), . . . , y{nT), ... in 
that order, it is formed by subblocks which replicate St to produce a square matrix of infinite 
size. Since St is symmetric block-circulant, then S is, in particular, symmetric block- Toeplitz, 
which implies that y is stationary. This concludes the proof. ■ 
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Remark 2.1: The periodic extension to the whole line Z of deterministic signals originally 
given on a finite interval [1, T] is a common device in (deterministic) signal processing. This 
simple periodic extension does however not preserve the structure of a stationary random pro- 
cess since the covariance of a periodically extended process will not be stationary unless the 
covariance function of the original process on [1, T] was center-symmetric to start with. This 
counter-intuitive fact has to do with the quadratic dependence of the covariance of the process 
on its random variables. 

Let for example y be a scalar process on the finite interval [1,4]; i.e. let T = 4 and m — 1. 
Suppose y has covariance matrix = Toepl{cro, cxi, (72, Ua}, the notation Toepl{a} meaning 
that Tit is a symmetric Toeplitz matrix with first column given by the vector a. The upper-left 
2T X 2T comer the covariance of the periodic extension of y is 













(71 


(72 


(72. 


o\ 


ctq 






Ox 


(7q 


(7l 


(72 




Ox 




Ox 


Ol 


(y\ 


(70 


(7\ 


0-3 




0-1 




0-3 


(72 


(7l 


O-Q 






0-2 


0-3 




(7l 


(72 


0-3 


Ox 




CTl 




Ox 


(70 


(7\ 


(72 


(72 




O-Q 


(y\ 


(72 


(7l 


(7q 


(7\ 




(72 


(7l 




0-3 


(72 


(7\ 





This matrix is clearly not Toeplitz unless (73 = a\, in which case would be symmetric 
circulant. Hence the extended process y is in general not stationary. 

Remark 2.2: In many applications to signal and image processing, the signals under study 
naturally live on a finite interval of the time (or space) variable and modeling them as functions 
defined on the whole line appears just as an artifice introduced in order to use the standard 
tools of (causal) time-invariant systems and harmonic analysis on the line. It may indeed be 
more logical to describe these data as stationary processes y defined on a finite interval [1,7"]. 
The covariance function, say St, of such a process will be a symmetric positive definite block- 
Toeplitz matrix which has in general no block-circulant structure. 

It is however always possible to extended the covariance function of y to a larger interval so 
as to make it center-symmetric. This can be achieved by simply letting St+^ := Ej_^_^ for 
T = 0,1,... ,7 — 1. In this way is extended to a symmetric block-circulant matrix St 
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of dimension (2T — 1) x (2T — 1), but this operation does not necessarily preserve positivity. 
Positivity of a symmetric, block-circulant extension, however, can always be guaranteed provided 
the extension is done on a suitably large interval. The details on how to construct such an 



extension are postponed to Section |V| see the proof of Theorem |5.1[ The original process y 
can then be seen as the restriction to the interval [1, T] of an extended process, say y, which 
lives on an interval [1, A^] of length N > 2T — 1. Since the extended covariance is, in any 
case, completely determined by the entries of the original covariance matrix S^, any statistical 
estimate thereof can be computed from the variables of the original process y in the interval 
[1, T] (or from their sample values). Hence, there is no need to know what the random vectors 
{y(A;) ; k = T + 1, . . . , N} look like. Indeed, as soon as we are given the covariance of the 
process y defined on [1, T], even if we may not ever see (sample values of) the "external" 
random vectors {y{k) ; k = T + 1, . . . , N}, we would in any case have a completely determined 
second-order description (covariance function) of y- 

In this sense, one can think of any stationary process y given on a finite interval [1, T] as 
the restriction to [l,T] of a wide-sense stationary periodic process, y, of period N > 2T — 1, 
defined on the whole integer line Z. This process naturally lives on the "discrete circle" Z^. 
Hence dealing in our future study with the periodic extension y, instead of the original process 
y, will entail no loss of generality. □ 

III. AR-TYPE RECIPROCAL PROCESSES 

In this section, we describe a class of random processes on a finite interval which are a 
natural generalization of the reciprocal processes introduced in [|27ll . discussed in J26l and, 
for the stationary case, especially in [|33l , [|34ll , see also IfTSl . In a sense, they are an acausal 
"symmetric" generalization of auto-regressive (AR) processes on the integer line. 

Let y be a zero-mean m-dimensional stationary process on [1, A^] and let Sat denote its 
mN X mN covariance matrix. We assume that Sat is a symmetric block-circulant matrix, so 
that y may be seen as a process on the discrete circle Zjy. In line with what argued in Remark 



2.2 we may, if we wish so, imagine that the matrix Sjv was obtained by extending a positive 
block- Toeplitz matrix as ([T]) to make it symmetric block-circulant. Then [1, A^] will have to be 
identified with an enlarged interval on which y is the periodic extension of some underlying 
stationary process. 
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Let 72 be a natural number such that > 2n. This inequahty will be assumed to hold throughout. 
We introduce the notation y[t-n,t) for the nm-dimensional random vector obtained by stacking 
y(t — n),...,y{t — 1) in that order. Similarly, y[t,t+n] is the vector obtained by stacking 
y(t + 1), . . . , y(t + n) in that order. Likewise, the vector yyt-n.t] is obtained by appending y{t) 
as last block to y^t-n,*), etc.. The sums t — k and t + A; are to be understood modulo N . Consider 
a subinterval (ti, t2 ) C [1, A^] where (ti, ^2 | < ^ < and (ti, denotes the 

complementary set in [1, A^]. 

Let A, 3, C be subspaces of zero mean second order random variables in a certain common 
ambient Hilbert space. Recall that A and 25 are said to be conditionally orthogonal, given 6 if 

(^a-E [a I e]) ± (^b-E [b I e]) , VaG^, VbGS. (6) 

Conditional orthogonality is the same as conditional uncorrelatedness (and hence conditional 
independence) in the Gaussian case. Various equivalent forms of this condition are discussed 
in [|28l . When ^, S, C are generated by finite dimensional random vectors, condition (|6]) can 
equivalently be rewritten in terms of the generating vectors, which we shall normally do in the 
following. The following definition does not require stationarity. 

Definition 3.1: A reciprocal process of order n on [1, A^] is characterized by the property 
that the random variables of the process in the interval (ti, t2 ) are conditionally orthogonal 
to the random variables in the exterior, (ti, t2Y, given the 2n boundary values y{ti~n,t-i] and 
y[t2,t2+n)- Equivalently, it must hold that 

E[y(ti,t2) I y(s), s G (ti, ^2)'] = E[y(i,,t2) I y(t,_„,j,] Vy[t2,i2+n)] , ^1,^2^ [1, A^] • (7) 
In particular, we should have 

t[y{t)\y{s),s^t]=t[y{t)\y^t_n,t)yy^t,t+n]\. te [1, N], (8) 

where the estimation error 

d{t) := y(t) - E [y(t) | y(s), s ^ t] = y(t) - E [y(t) | y[i_„,,) V y^t,t+n]] , t E [1, N] (9) 

must clearly be orthogonal to all random variables {y(s), s 7^ t}; i.e. 

Ey(t)d(s)T =A6ts, t, s G [1, N], (10) 

where S is the Kronecker function and A is a square matrix. The actual meaning of A will be 
clarified a few lines below. In the spirit of Masani's definition [Ulil . d is called the (unnormalized) 
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conjugate proces]^of y. Since d{t + k) is a linear combination of the components of the random 
vector 'y[t+k-n,t+k+n], it follows from ( fTO] ) that both d(t + k) and d{t — k) are orthogonal to 
d(t) as soon as A; > n. Hence the process {d(t)} has correlation bandwidth n; i.e. 



Ed(t + A;)d(t)^ = for n < \k\ < N - n, k e [0, N - 1] . 



(11) 



It follows from ^ that a reciprocal process of order n on [1, A^], can always be described by 
a linear double-sided recursion of the form 



5^Ffcy(t-A;) = d{t) , t e [1, N] 



(12) 



k=—n 



where the F^'s are m x m matrices, in general dependent on t, with Fq = Im and d a process 



of correlation bandwidth n, orthogonal to y in the sense of pO] ). In fact, it follows from ( [T0| ) 
that Ed(t)d(t)^ = A and hence A is the variance matrix of d(t), symmetric and positive 
semidefinite. 



Equation ( [12] ) requires the specification of boundary values, which will be described in Theorem 
O below. 



Lemma 3.1: Ify is stationary, the matrices {Fk} in the representation ([12]) do not depend 
ont.Ify is full rank, they are uniquely determined by the covariance lags of the process up to 
order 2n. 

Proof: The {Fk{t)ys are determined by the orthogonality condition d(t) ± y[t-n,t ) Vy(t,t+n], 
which can be expressed as 



F_„(t) ... F_i(t) Fi(t) ... F„(t) 

• • • ^l" ^1 • • • '^n 



Qn 



(13) 



where 



So 

s7 



.n-l 



Si ... 
So ... 

... s7 



Sn-: 

Si 
So 



Qn ." 



Sn+1 S„+2 
Sn S„_|_i 



-'2n 



-'2n-l 



-'n+1 



(14) 



Also called double-sided innovation. 
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Note that, because of stationarity, none of the covariance matrices depends on t. The determinant 



of the large block-matrix in ( [13] ) is a principal minor of order 2n of Sat. If y is full rank, it 
must be nonzero and the matrix must be invertible. Therefore the matrices {Fk} do not depend 
on t and are uniquely determined. ■ 
For stationary reciprocal processes on Zat, the boundary-values to be attached to the linear 



model ( fT2| ) are a straightforward consequence of the fact that y has a stationary periodic extension 
to the whole axis Z. 

Theorem 3.1: A stationary reciprocal process, y, of order n on Zat satisfies a linear, constant- 



coefficients difference equation of the type ( |T2| ), associated to the 2n cyclic boundary conditions: 

y{k) =y{N + k); k = -n + 1, . . . ,n . (15) 

The model can be rewritten in matrix form as 

F^y = d. (16) 

where F ^ is the N -block banded circulant matrix of bandwidth n, 

:= Circ{/, Fi, . . . , F„, 0, . . . 0, . . . , . (17) 

If the process is full rank this description is unique. 
Proof: By definition 

E[y(l) |y(s), s^l]=E[y(l) | y[i-„„ i) V y(i, i+„]] , 

which is a linear function of y[i-n,i) V y(i,i+„], whereby we can express y(l) as 

y(l) = -F_y(i,i+„] -F+y[i_„,i) + d(l) 

for some coefficient matrices F_, F^. The process y has a periodic extension of period N and 
hence the missing initial boundary vector y[-„+i,i) is actually the same as y[N-n+i,N\^ so that 

y(l) = -F_y(i,i+„] - F+y[7v-„+i,7V] + d(l) . 

By stationarity, the various m x m blocks in the matrices F must satisfy the same system 



of equations ( [13] ) which was derived by imposing the orthogonality condition d(t) ± y[t-n,t) V 
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y(t,t+n]5 for all times t. Since the solution is unique, it must hold that = , k = ±n, . . . , ±1 



where the F^s are the same block matrices introduced before for ( |T8] ). Hence, we have 

-1 n 
k=—n fc=l 

which is the first (t = 1) block equation in ([12]) once the first set of boundary conditions in ( fT5| ) 
is used to replace the missing random variables y^i-n, i)- Similar expressions can be derived for 



y(2), . . . ,y{n) and for y(A^ — n), . . . , y{N). From this it readily follows that y satisfies ([16]) 
where Fat has the banded circulant structure ( [TT] ). ■ 



Using the notations F_ and F+ for 
covariance A = Var{d(t)} can be expressed as 



and 



respectively, the error 



A 



(18) 



Qn 

Qn ^ri 

The following proposition is a simple generalization of analogous statements in [|27l . [|34l for 
n = 1. 

Proposition 3.1: A stationary reciprocal process y is full rank if and only if the variance 
matrix A of the conjugate process is positive definite. 



Proof: (if) Suppose A > 0. Multiplying both members of (16) from the right by y and 



taking expectations, in virtue of the orthogonality relation ( [TO] ), we get 

= F^ E yy^ = E dy^ = diag{ A, . . . , A}. 



(19) 



Thus A > implies that the square matrices Fjy and Sjy are invertible which, combined with 
the positive semidefiniteness of S^v, implies S^v > 0. 

(only if) Suppose now that A is only positive semidefinite. This implies that there exists 7^ 
a E s.t. E a^d(t)d(t)^a = 0, i.e. s.t. a^d(t) = a.s.. This means that the scalar components 



of d(t) are linearly dependent, which, by ([T2j), implies that y{t — n), . . . ,y{t), . . . ,y{t + n) are 
hnearly dependent. Thus Sjv must be singular, which contradicts the assumption Sat > 0. ■ 
Solving ( [T9| ) we can express the inverse as 

diag{A-\...,A-^}F;v 



M 



N 



(20) 



so that Mtv is symmetric block-circulant and positive definite, being the inverse of a matrix with 
the same properties. Furthermore, := A^^F^ , k = —n, . . . ,n and Mq = A^^, must form a 
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center- symmetric sequence of bandwidth n\ i.ej^ 

M_, = M,^, A; = l,...,n. (21) 
If we normalize the conjugate process by setting 

e(t) := A-M(t) (22) 
so that Var{e(t)} = A^^, the model ([12]) can be rewritten 



Mky{t - k) = e{t) , teZ^ (23) 



k=—n 



for which the orthogonality relation ( fTO| ) is replaced by 

EyeT=I^. (24) 

We shall now show that M^v is actually the covariance matrix of the normalized conjugate 
process e. For, by the normalization ( [22] ), our reciprocal process y satisfies the linear equation 

M^y = e (25) 



which implicitly includes the cyclic boundary conditions ([T3j). Multiplying this from the right 
by and taking expectations, we get MjvE{ye^} = E{ee^} which, in force of ( [24] ), yields 



Var {e} = M^v (26) 

as announced. We see that the inverse of the covariance matrix of a full rank stationary reciprocal 
process of order n, must be a banded block-circulant matrix of bandwidth n. 

This is in fact a fundamental characterization of stationary reciprocal processes of order n. To 
prove it, we need to take up the (inverse) question of well-posedness, namely if an autoregressive 



model of the form ( 12) associated to the proper cyclic boundary conditions, determines uniquely 



a process y which is stationary and reciprocal of order n. 



To this end we may just as well examine the equivalent normalized model (25). 



Theorem 3.2: Consider a linear model (25) where Mat is a symmetric positive-definite 



banded block-circulant matrix of bandwidth n and the process {e(t) ; t G Z^} is a stationary 
process on Zj^ with covariance matrix M^y. 

''That is to say that model (|T2j is self-adjoint. 
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Then there is a unique full rank stationary reciprocal process y of order n, solution of (25 1. 
This process satisfies the orthogonality condition (|24]) and e is its normalized conjugate process. 

Proof: Pick a finitely correlated process e with covariance matrix M^f (we can construct 
such a, say Gaussian, process on a suitable probability space) and let y be a solution of the 



equation ( [23] ) with boundary conditions ( [T5| ), equivalently a solution of ([25]). Then, since Mat is 
invertible, the process y is uniquely defined on the interval [1, A^], i.e. there is a unique random 
vector, y, solution of ( [25[ ). Let S^r be its covariance matrix. We have, S^r := E [yy^] = 
E [Mjy^ee^M^^] = M^^, so that Sat is a symmetric positive-definite block-circulant matrix 



and the process y is stationary on Zat (Proposition 2.1). 



By multiplying ( [25[ ) by e and taking expectations, we find M^fE {ye } = M^r, so that 
E{ye^} = Iat, or equivalently E{y(t)e(s)^} = Im^ts- Therefore, the orthogonality ( [24[ ) holds 
on Zat. 

Next, we need to show that y is reciprocal of order n. To this end we shall generalize an 
argument of [34]. Let s < t be two points in [1, N] , which for the moment we choose such 



that t — n > s + n, which is always possible since by assumption N > 2n. Expanding (23 ) and 
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rearranging terms, we can write 

'Mo ... ... 

Ml Mo Mj ■■. 



Mn ... 

M„ 



Ml Mo Mi^ 
... Mo 






■ e(.) 
e(s + l) 

e(s + n) 

e{t — n) 

e{t - 1) 
e{t) 
















M„ 











which can be compactly rewritten as 








. MJ 



M„ 

Ml 
M2 

M„ 



n 

Ml Mo Mj 
... Ml Mo 

... 

... 

... 

... 

... 



y(^) 
y(^ + i) 

y{t + n) 
y{s - n) 



... 

... 

... 

... 

... 

... 

Mj 



m7 





N 
























m2 



y[t-n,t) 
y{s,s+n] 



y{t - n) 
y(t-n + 1) 

- 1) 
y(s + i) 

y{s + n- 1) 

y{s + n) 



(27) 



(28) 



with an obvious meaning of the symbols. Note that M is non-singular, its determinant being a 
principal minor of Mjv, and hence nonzero; while the two random vectors on the right hand 
side are uncorrelated since all scalar components of ejt are orthogonal to the linear subspace 
spanned by (the scalar components of) {y{r); r e [t, s^} and hence are in particular orthogonal 
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to the boundary condition vectors y(s,s+„], y[t-n,t)- Solving ( [28] ) we can express y[t^s\ as a sum 
of two linear functions of G[t^s\ and of y{s,s+n\ V y[t-n,t) so that the orthogonal projection onto 
the linear subspace spanned by (the scalar components of) {y(r) ; r G [t, s^} results in a linear 
function of (the scalar components of) y[t-n,t) V y(^s,s+n] alone. This proves the conditional 
orthogonality of y[f,s] to the other random variables of the process, given the boundary values 

y[t-n,t) , y{s,s+n]- 

The argument remains valid also when the non overlapping condition t — n > s+n does not hold; 
i.e. for an arbitrary interval [t, s] of the discrete circle Z^. For, when [t — n, t) and ( s, s + n] 
overlap clearly we have [t, s]"^ C [t — n, t) U ( s, s + n] and hence all random variables in the 
subspace spanned by {y{f) ; t E [t, sY} are contained in the subspace spanned by the boundary 
conditions, say 6 := {y(r) ; r G [t — t) U ( s, s + n]}. This means that E [y(r) | C] = y('r), 
or equivalently that 

y(r)-E[y(r) | C] =0, r G [t, sf 

so that the second member in (|6]) is zero and hence the orthogonality condition trivially holds. 

■ 

From this result, we obtain the following fundamental characterization of reciprocal processes 
on the discrete group Z^r. 

Theorem 3.3: A nonsingular mN x mN -dimensional matrix S^v is the covariance matrix 
of a reciprocal process of order n on the discrete group Z^ if and only if its inverse is a 
positive -definite symmetric block-circulant matrix which is banded of bandwidth n. 

Note that the second order statistics of both y and e are encapsulated in the covariance matrix 
Mat. In other words, the whole auto-regressive model of y is defined in terms of the matrix 
Mat. Note also that this result makes the stochastic realization problem for reciprocal processes 
of order n conceptually trivial. In fact, given the covariance matrix S jy (the external description 
of the process), assuming that it is in fact the covariance matrix of such a process, the model 
matrix Ma? can be computed by simply inverting Sat. This is the simplest answer one could 
hope for. The solution requires however a preliminary criterion to check whether a (full rank) 
symmetric block-circulant covariance matrix has a banded inverse. There seems to be no simple 
known answer to this question. 

Finally, to make contact with the literature, we note that a full rank reciprocal process of order 
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n can always be represented as a linear memoryless function of a reciprocal process of order 1 . 
This reciprocal process, however, need not be of full rank. To see that this is the case, introduce 
the vectors 



yit) 



y(t + n- 1) 



y(t-n + l) 



yit) 



(29) 



Letting x(t)'^ : = 



(yt V {yt 

x(t) 

yit) 







we find the representation 

x(t) 



F+ 


x(t - 1) + 














F_ 



d(t) 



Olio 







(30) 
(31) 



where F_ and are the block-companion matrices 

" JO... " 
/ ... 
/ 

-F -Fi 



I 





/ 



-F_ 






and d{t) = | [O ... d{t)^ d(t)^ ... O]^ has a singular covariance matrix. This model is in 
general non-minimal [[34ll . 

IV. Identification 

Assume that T independent realizations of one period of the process y are available]^ and let 
us denote the string of sample values hy y := {y^^\ . . . ,y^^^)- We want to solve the following 

Problem 4.1: Given the observations y of a reciprocal process y of (known) order n, estimate 
the parameters {Mk} of the underlying reciprocal model MAry = e. 

Note first that if we are given 2n + l covariance data {S^ ; A; = 0, 1, ... , 2n}, the identification 
of an order n reciprocal process can be carried out by a linear algorithm, namely by solving the 



Yule-Walker-type system of linear equations (13). 



This procedure is however unsatisfactory since, due to the symmetry pT] ), there are actually 
only n + 1 unknown to be computed. Hence, one would expect only n + 1 covariance lags 



For example, a "movie" consisting of T successive images of the same texture. 
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to be needed, while the system ( [T3| ) requires solving also for the negative order coefficients. 
Moreover, in practice, the S^'s will have to be estimated from observed data and estimates of 
covariances with a large lag k will unavoidably be more uncertain and have a larger variance. 

In an attempt to get asymptotically efficient estimates for the M^'s, we consider maximum 
likelihood estimation. To this end, we set up a Gaussian likelihood function (which does not 
require to assume that y has a Gaussian distribution, see IfTSl p. 112]), which uses the density 
function 

P(Mo,...,M„)(y) = / ^ exp [-ly^MNy] , 



where y E M™^. Taking logarithms and neglecting terms which do not depend on the parameters, 
one can rewrite this expression as 

logp(A./o,...,A./„)(y) = - ^ log det (M]^i) - ^ tr {M^ yy'^} (32) 

Assuming that the T sample measurements are independent, the log-likelihood function, depend- 
ing on the n + 1 matrix parameters {M^ ; = 0, 1, . . . , n}, can be written 

n 

L{Mo, . . . , MO = log det (M^) - tr {M^ {y) } (33) 

fc=0 

where each matrix- valued statistic Tk{y) has the structure of a sample estimate of the lag k 
covariance of the process. For example, Tq and Ti are given by: 

T f N-l 



Toil) = 

t=l I k=0 
T (N-l 



t=l K k=l 

+ |X:y«(iV-i)b«(0)]^ 

t=l 

From exponential class theory fT], we see that the T^'s are (matrix- valued) sufficient statis- 



tics. Indeed, we have the well-known characterization that the (suitably normalized) statistics 
To, Ti, . . . ,Tn are Maximum Likelihood estimators of their expected values, namely 



So := ^To = M.L. Estimator of Ey(A;)y(A;)'^ 



(34) 



S„ := = M.L. Estimator of Ey(A; + n)y{ky . 
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Let us now consider the following matrix completion problem, which, form now on, will be 
referred to as the block-circulant band extension problem. 

Problem 4.2 (Block- Circulant Band Extension Problem): Given n + l initial data mxm 
matrices Sq, • • • , Sn, complete them with a sequence S„+i, S„+2, • • • , Sjv_i, in such a way to 
form a positive definite symmetric block-circulant matrix Sat with a block-circulant banded 
inverse of bandwidth n. 

Note that the model parameters (Mq, Mi, . . . , M„) are the nonzero blocks of the (banded) 



inverse of the covariance matrix Sat of the process (Theorem 3.3). The invariance principle for 
maximum likelihood estimators [42J leads then to the following statement. 

Theorem 4.1: The maximum likelihood estimates of{MQ, Mi, . . . , M„) are the nonzero blocks 
of the banded inverse of the matrix S solving the block-circulant band extension problem with 
initial data the n + l covariance estimates (1341). 



Hence, solving the original identification problem |4.1| has been shown to lead to the solution 

is 



of a block-circulant band extension problem. Note, however, that the extension problem 4.2 



nonlinear and it is hard to see what is going on by elementary means. Below we give a scalar 
example. 

Example 4.1: Let m = l, N = 8, n = 2 and assume we are given the covariance estimates 
o'o, o"i, cr2, forming a positive definite Toeplitz matrix. The three unknown coefficients in the 



reciprocal model ([23]) of order 2 are scalars, denoted mg, mi, m2. Multiplying ( [25] ) from the 
right by y^, we get MjySjv = Iat, which leads to 

mo mi m2 m2 mi 

mi mo mi m2 m2 

m2 mi mo mi m2 

m2 mi mo mi m2 

m2 mi mo mi m2 

m2 mi mo mi m2 

m2 m2 mi mo mi 

mi m2 m2 mi mo 

where xs := 0-3 = 0-5 and 0:4 := (74 are the unknown extended covariance lags. Rearranging and 









<3-i 




0-2 




X3 




X4 








0-2 
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eliminating the last three redundant equations, one obtains 

moO"o + Imidi + 2m2a"2 = 1 

moO"! + mi((To + 0-2) + m2(o-i + X3) = 

mo(T2 + 'mi((3"i + ^3) + m2(o-o + X4) = 

moXs + mi ((72 + X4) + 7712(6-1 + X3) = 

moa;4 + 2miX3 + 2m2(3"2 = 

which is a system of five quadratic equations in five unknowns whose solution already looks 
non-trivial. It may be checked that, under positivity of the matrix Toepl{(To, (Ti, (T2}, it has a 
unique positive definite solution (i.e. making Mat positive definite). 



At first sight the circulant band extension problem of Theorem 4.2 recalls the classical band 
extension problems for Toeplitz matrices studied in [|T3l . [fTTl . which is solvable by factorization 
techniques. However, the banded algebra framework on which these papers rely does not apply 
here. The circulant band extension problem seems to be a new (and harder) extension problem. 
General covariance extension problems are discussed in an illuminating paper by A. P. Dempster, 
[fTT|. Notice, however, that Dempster's procedures, having been conceived to solve a general 
covariance extension problem, do not exploit the circulant structure of the present setting and are 
computationally very intensive even for small scalar instances. A possible approximate approach 
to the circulant band extension problem was proposed in ||6l. This approach, based on a result 
of B. Levy [|25l , exploits the fact that for — > 00 the problem becomes one of band extension 
for infinite positive definite symmetric block- Toeplitz matrices, for which satisfactory algorithms 
exist. For finite however, this approximation may in some cases turn out to be poor. In the 
next section, we propose a new approach to the circulant band extension problem. 

V. Maximum entropy on the discrete circle 

Dempster's paper, which deals with general, unstructured covariance matrices, only considers 
Gaussian distributions. He solves the following extension problem: Characterize, among all 
covariance matrices sharing a given set of entries, the one corresponding to the (zero-mean) 
maximum entropy Gaussian distribution. For our purposes, a key observation is Statement (b) 
in [fm p. 160]. In our setting, it reads as follows. 
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Proposition 5.1: Assume feasibility of the covariance extension problem. Among all covari- 
ance extensions of the data Sq • • • , Sn, there exists a unique such an extension whose inverse 's 
entries are zero in all the positions complementary to those where the elements of the covariance 
are assigned. This extension corresponds to the Gaussian distribution with maximum entropy. 
This principle of entropy maximization will lead us to a new convex optimization procedure for 
computing the band extension. 

We hasten to remark that in this paper we are not restricting ourselves to the case of Gaussian 
distributions. We shall consider Sjv to be the matrix variance of a Gaussian distribution only 
for the purpose of interpreting the following optimization problem in the light of Dempster's 
result. The far reaching implications of our maximum entropy principle for general probability 



distributions is provided in Theorem 7.2 below. 



Notations 

Let Uat denote the block-circulant "shift" matrix with N x N blocks, 

. 



U 



N 



0/^0 



Im 



/n, 











where denotes the m x m identity matrix. Clearly, U^Uat = UatU^ = ImN\ i-C- Uat is 
orthogonal. Note that a matrix C with N x N blocks is block-circulant if and only if it commutes 
with Ujy, namely if and only if it satisfies 

VJ^CVn = C. (35) 
Recall that the differential entropy H{p) of a probability density function p on W is defined 

by 

H{p) = — \og{p{x))p{x)dx. (36) 
In case of a zero-mean Gaussian distribution p with covariance matrix Sat, we get 

Hip) = ^ log(det S;v) + (1 + log(27r)) . (37) 
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Let & TV denote the vector space of symmetric matrices with N x N square blocks of dimension 
m X m. Let T„ G ©n+i denote the Toeplitz matrix of boundary data: 

So s7 ... 

Si 



So 



(38) 



and let En denote the x (n + 1) block matrix 








... 





Im 


... 














Im 








... 



The Maximum Entropy problem on ^L^ 

Consider the following Gaussian maximum entropy problem (MEP) on the discrete circle: 
Problem 5.1: 



min {—tr log Sat 
subject to : 



I^n ^nEu — 



S,v e &N, Sat > 0} 



UatStvUat 



J AT. 



(39) 

(40) 
(41) 



Recalling that trlogSAf = logdetSAr and (37), we see that the above problem indeed amounts 
to finding the maximum entropy Gaussian distribution with a block-circulant covariance, whose 
first n + 1 blocks are precisely So, . . . , S„. The circulant structure is equivalent to requiring 
this distribution to be stationary on the discrete circle Z^. We observe that in this problem we 
are minimizing a strictly convex function on the intersection of a convex cone (minus the zero 
matrix) with a linear manifold. Hence we are dealing with a convex optimization problem. 



Note that we are not imposing that the inverse of the solution Sat of Problem 5.1 should have a 
banded structure. We shall see that, whenever solutions exist, this property will be automatically 
guaranteed. 
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The first question to be addressed is feasibility of (MEP), namely the existence of a positive 



definite, symmetric matrix Sjv satisfying ([40|)-([4T]). Obviously, T„ positive definite is a necessary 
condition for the existence of such a S^v. In general it turns out that, under such a necessary 
condition, feasibility holds for N large enough. The idea is that for — ^ oo, Toeplitz matrices 
can be approximated arbitrarily well by circulants ( ll30ll . [f39ll ) and hence existence of a positive 
block-circulant extension can be derived from the existence of positive extensions for Toeplitz 
matrices. 

Theorem 5.1: Given the sequence Sj G M™^"^, i = 0, 1, . . . ,n, such that 

T^ = Tl> 0, (42) 

there exists N such that for N > N, the matrix T„ can be extended to an N x N block-circulant, 
positive-definite symmetric matrix Stv- 

Proof: A fundamental result in stochastic system theory is the so-called maximum entropy 
covariance extension. It states that, under condition ( [42] ), there exists a rational positive real 
function $+(2) = ^ + C{zl - A)-^B such that 

1) A has spectrum strictly inside the unit circle. 

2) Si = CA'-^B, i = l,2,...,n. 

3) The spectrum $(2;) := $+(z) + $+(-2) is coercive, i.ej^ 

3e>0 such that $(ej'') > el, W G [0,27r). (43) 

In fact $(2;) has no zeros on the unit circle since it can be expressed in the form <l>(z) = 
Ln{z^^y^ AnLn{z)^^ whcrc Ln{z~^) is the n — th Levinson-Whittle matrix polynomial (also 
called n — th matrix Szego polynomial) of the block Toeplitz matrix T„, and A„ = > 0; see 

ioi, m and m. 

Let Si := CA^^^B, i = n + 1, n + 2, . . . , so that $+(2) = + Y.T=i ^i^^'^ and define 
Circ ^Sq, S7, SJ, . . . , SX-i , Siv-i , Siv-i_j^, . . . Sij , odd 



Circ ( So, S7, Sl, . . . , S]^_2 , + Siv , Siv^, Y.n^, , . . . Si ) , even 

V T 2 2 2 ^ / 



(44) 



2 2 

We need to show that there exists iV such that Sjv > for > iV. To this aim, notice that S^r 
is, by definition, block-circulant so that, a similarity transformation induced by a unitary matrix 

* Here, and in the following, j denotes the imaginary unit \f^-\. 
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V reduces S^v to a block-diagonal matrix: 

V*S7vV = := diag (^o, ^i, • • • , *7v-i) , 
where V is the Fourier block- matrix whose k, l-th block is 

Vki = l/TiVexp [-j2n{k - 1)(/ - 1)/N] 
and '^i are the coefficients of the finite Fourier transform of the first block row of Eat: 



VI/, = Eo + eJ'^^S^ + (eJ^O^J + " " " + (e^''^) ^2 + (e^''^) Si, (45) 
with t?f := -2'nt/N, see e.g. (Ml Sec. 3.4]. Clearly, (eJ'^^)^"* = (eJ''^)"' and hence 

= $ (gj''^) - [5^j^ [(^^^) + 5$^ (eJ''^)] (46) 

where, 

00 00 I N—l 



2 ' 



N odd 



-_h+x i=h+i I N/2, N even 

(47) 

Since A is a stability matrix, if A^, and hence h, is large enough, {^'^'^) + ^^^r (e^'''^) is 
dominated by el, i.e. there exists N such that 

(5$7v {(^'^') + 5$^ (ej''^) < el, We, > N (48) 



so that it readily follows from (43) and (46) that if > A^, > for all i. 



We observe that, given T„, the triple A, B,C can be explicitly computed so that we can 



compute e and N for which (48) holds. In other words. Theorem 5.1 provides a sufficient 
condition that can be practically tested. Similar bounds, but valid only for the scalar case, were 
derived in ifTOll . 

VI. Variational analysis 

We shall introduce a suitable set of "Lagrange multipliers" for our constrained optimization 
problem. Consider the linear map A : &n+i x &n defined by 

^(A, 6) = EnAEj^ + VnQUI - e, (A, 6) G 6„,+i x 6^. 

and define the set 

:= {(A, 6) G (6n+i X 6^) I (A, 6) G (ker(A))^, {E^AEJ + VnOVJ, - 6) > 0}. 
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Observe that is an open, convex subset of (keT:(A))-^. For each (A,©) e £+, we consider 
the unconstrained minimization of the Lagrangian Junction 

L(Sjv, A, ©) := -tr log + tr (A (^^^Sjv^^n - T„) ) + tr (6 (U^^S^tUat - ^n) ) 
= -tr log S;v + tr (E„AE^Sjv) - tr (AT„) + tr (UjveU^S^) 

-tr(es^) 

over Gn,+ '■= {^n ^ &n, '^n > 0}. For ^S^r e ©at, we get 

5L(Siv, A, O; 5Sjv) = -tr (S]^^5Siv) + tr {EnAE^S^EN) + tr ((Uiv©U;^ - ©) ^Sjv) . 
We conclude that 5L(Sjv, A, 9; SY^n) = 0, WSY^n e 6jv if and only if 

i:],' ^ E^AE^ + Vr,evl - e. 

Thus, for each fixed pair (A, ©) e £+, the unique minimizing the Lagrangian is given by 

Y% = {E^AEl + U^eU^ - e) . (49) 
Consider next L(S^, A, O). We get 

L(S^, A, 6) = -trlog {{E^AEl + U^GU^^ - 6)"') 
+tr [(KA£;^ + UivOU;^ - e) {Er^AEl + U^eU^^ - e)~'] - tr(AT„) (50) 
= trlog {Er^AEl + UivGUj^ - 6) + tr/„iv - tr (AT„) . 

This is a strictly concave function on whose maximization is the dual problem of (MEP). 
We can equivalently consider the convex problem 

min{J(A,e),(A,e) e £+}, (51) 

where J (henceforth called dual function) is given by 

J(A, 0) = tr (AT„) - tr log {EnAEl + IJnQ'^I - ©) ■ (52) 
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Existence for the dual problem 

The minimization of the strictly convex function J(A, 0) on the convex set £+ is a challenging 
problem as is an open and unbounded subset of {kei{A))-^. Nevertheless, the following 
existence result in the Byrnes-Lindquist spirit, lfT6l . jSl, [fT4ll can be established. 

Theorem 6.1: The function J admits a unique minimum point (A, 0) in £+. 
In order to prove this theorem, we need first to derive a number of auxiliary results. Let 
denote the vector subspace of block-circulant matrices in &m- We proceed to characterize the 
orthogonal complement of <Lm in &n- 

Lemma 6.1: Let M E &n- Then M G {*^n)'^ if and only if it can be expressed as 

M = UjvA^U^ - N (53) 

for some N E &n- 

Proof: By ( |35| ), Cat is the kernel of the linear map from ©at to &n given by M i— )■ 
U^MUat — M. Hence, its orthogonal complement is the range of the adjoint map. Since 

tr ((U^MUjv - M)N) = (U^MU^ -M,N) = (M, UjviVU^^ - A^), 

the conclusion follows. ■ 
Next we show that, as expected, feasibility of the primal problem (MEP) implies that the dual 
function J is bounded below. 



Lemma 6.2: Assume that there exists Sat G &n,+ satisfying {40)-{41). Then, for any pair 
(A, 0) G £+, we have 

J(A,0) > mA^ + trlogS;v (54) 



Proof By (|40|), tr(AT„) = tr(AE^Siv^„) = tr(E„AE^S7v). Using this fact and Lemma 
|6.1[ we can now rewrite the dual function J as follows 

J(A,0) = tr(AT„)-trlog(E„AET + U^0U;^-0) 

= tr [{EnKEl + VnQ^I - 0) Stv] - trlog {EnKEl + VnQ^I - 0) • 

Define M(A, 0) = {E^AE^ + UAr0U)^ - 0) which is positive definite for (A, 0) in C+. Then 

J(A,0) =tr(M(A,0)Siv) -trlogM(A,0). 
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As a function of M, this is a strictly convex function on &n,+, whose unique minimum occurs 
at M = where the minimum value is tr(/mAr) + tr log Sat. ■ 
Lemma 6.3: Let {Ak,Qk),n > 1 be a sequence of pairs in such that || (A^, 6/,) || oo. 
Then also \\A (A^., 0^) || — )■ oo. It then follows that ||(Afc, 0a,.)|| — >■ oo implies that J{Ak, Qk) — >■ 
oo. 

Proof: Notice that A is a linear operator between finite-dimensional linear spaces. Denote 
by (Tm the smallest singular value of the restriction of A to {ker A)-^ (the orthogonal complement 
of kei A). Clearly, cr^ > 0, so that, since each element of the sequence (A^, 9^) is in (ker A)-*-, 
\\A{Ak,Qk) II > c^m II (Afc, 6^)11 ^ oo. 

Assume now that ||y4 (Ak, Qk) \\ = \\ {EnAkEj^ + UAr0A,.U^ — 0^) || — t- oo. Since these are 
all positive definite matrices and all matrix norms are equivalent, it follows that 

tr [EnAEl + U,v0U;^ - 6) ^ oo. 

As a consequence, tr [[EnAE^ + UAfQU^ — 0) Stv) — )■ oo and, finally, J{Ak, ©a,) — )■ oo . ■ 
We show next that the dual function tends to infinity also when approaching the boundary of 
£+, namely 

dC+ := {(A, 6) G (6.+1 X 6^)|(A,e) G (ker(A))^, [E^AEI + \J ^QVl - 6) > 0, 

det {EnAEl + VnQ^I - 0) = 0}. 

Lemma 6.4: Consider a sequence (A^, 0^), k > 1 in such that the matrix 
limfc (^EnAkE^ + UAr0fcU^ — 0^) is singular Assume also that the sequence (A^, 0^) is bounded. 
Then, J{Ak, 0^) — > oo. 
Proof: Simply write 

J{Ak, 0fc) = - log det {E^AkE^ + U^vO^U^ - 0^) + tr(A,,Tfe). 

Since tr(AfcT,fc) is bounded, the conclusion follows. ■ 



Proof of Theorem 6.1 Observe that the function J is a continuous, bounded below (Lemma 



6.2 1 function that tends to infinity both when ||(A, 0)|| tends to infinity (Lemma 6.3) and when 



it tends to the boundary dC^ with ||(A, 0)|| remaining bounded (Lemma 6.4). It follows that J 
is inf-compact on £+, namely it has compact sublevel sets. By Weierstrass' Theoreni[^ it admits 
at least one minimum point. Since J is strictly convex, the minimum point is unique. □ 

'a continuous function on a compact set always achieves its maximum and minimum on that set. 
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VII. Reconciliation with Dempster's Covariance Selection 



Let (A, 6) be the unique minimum point of J in (Theorem 6.1 ). Then G &n,+ given 



by 



(55) 



satisfies (40) and (41). Hence, it is the unique solution of the primal problem (MEP). Since it 



satisfies (41 ), is in particular a block-circulant matrix and hence so is 

Let TT^j^ denote the orthogonal projection onto the linear subspace of symmetric, block-circulant 



matrices ftiv. It follows that, in force of Lemma 6.1 



TT,-^ {E^KEl + U^veu;^ - e) = TTe, (^nA^^) . (56) 



Theorem 7.1: Let be the maximum Gaussian entropy covariance given by ((55]). Then 



(S^)~^ is a symmetric block-circulant matrix which is banded of bandwidth n. Hence the 
solution of (MEP) may be viewed as the covariance of a stationary reciprocal process of order 
n defined on Zjy. 
Proof Let 

^Ho lij iq ... Hi' 



Ha := n^, {EnAEj) 



Hi Ho n7 



Hi Ho uj 

... Hi no_ 

be the orthogonal projection of [EnAE^^ onto Cat. Since is symmetric and block-circulant, 
it is characterized by the orthogonality condition 

tr [{E„Ae'^, - Ha) C] = {EnAEj - Ha, C) = 0, VC G 

Next observe that, if we write C = Circ [Cq, Ci, C2, . . . , Cj, Cj] and 

Aqo ^01 Aon 

AJq All • • • Ai„ 



In- 



(57) 



A 



Ar 



A 



Kk 
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then 

tr [E^KElC] = tr [AE^CE„] = tr [(Aqo + An + . . . + A„„)Co 

+ (Aoi + Ai2 + . . . + K-l,n)Cl + ...+ AonCn 
+ (Aio + A21 + . . . , An,n~l)Cj + ...+ AnoCj] . 

On the other hand, recalling that the product of two block-circulant matrices is block-circulant, 
we have that tr [H^^C] is simply times the trace of the first block row of 11^ times the first 
block column of C. We get 

tr [Uj,C] = Ntr [HoCo + HJCi + nJCs + . . . + U2CJ + u,cj] . 



Hence, the orthogonality condition (57), reads 

tr [{EnAE^ - Ha) C] = tr [((Aqo + An + . . . + A„„) - NUo) Co+ 

+ ((Aoi + Ai2 + ...+A„_i,„)-Arn7)Ci 
+ ((Alo + A21 + . . . , A„,„_i) - NU^) Cj 

+ ...{Aon-NUj)Cn + {AnO-NUi)Cj)] 

Since this must hold true forall C G €n, we conclude that 

no = ^(Aoo + An + ...+A„„), 

Hi = ^(Aoi + A12 + . . . + An-l,nV, 



n - — A^ 



while from the last equation we get Ilj = 0, forall i in the interval n + l<i<N — n — 1. 
From this it is clear that the inverse of the covariance matrix solving the primal problem (MEP), 
namely 11^ = (S^)^^ has a circulant block-banded structure of bandwidth n. ■ 
Since the beginning of Section V, we have been dealing only with Gaussian distributions in 
order to facilitate the comparison with Dempster's classical results. It is now time to show that 
the Gaussian assumption can be dispensed with, and our solution is indeed optimal in the larger 
family of (zero-mean) second-order distributions. 
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Theorem 7.2: The Gaussian distribution with (zero mean and) covariance defined by 



(55) maximizes the entropy functional (36) over the set of all (zero mean) probability densities 



whose covariance matrix satisfies the boundary conditions (40), (41). 



Proof Let CAr(T„) be the set of (block-circulant) covariance matrices satisfying the bound- 



ary conditions ( [40] ), ( [41] ) and let p^, be a probability density with zero mean and covariance 
S. In particular, we shall denote by g^, the Gaussian density with zero mean and covariance 
S. Now, by a famous theorem of Shannon [[37l . the probability distribution having maximum 
entropy in the class of all distribution with a fixed mean vector (which we take equal to zero) 
and variance matrix S, is the Gaussian distribution g^. Hence: 



max < max \H{p-^)\ > = max {Hiq-s)} 

see:^^(T„) ps ' ' " ] see:Ar(T„) 

where the maximum in the right-hand side is attained by (^s^. ■ 
The above can be interpreted as a particular covariance selection result in the vein of Demp- 
ster's paper; compare in particular [[TTl Proposition a]. In fact the results of this section substan- 
tiate also the maximum entropy principle of Dempster (Proposition [5 ■1[ ). It is however important 
to note that none of our results follows as a particular case from Dempster's results, since |11| 
deals with a very unstructured setting. In particular our main result (Theorem |7.1[ ) that the 
solution, S^, to our primal problem (MEP) has a block-circulant banded inverse, is completely 
original. Its proof uses in an essential way the characterization of the MEP solution provided 
by our variational analysis and cleverly exploits the block-circulant structure. 

Actually, our results, together with Dempster's, may be used to show that the maximum 
entropy distribution, subject only to moment constraints (compatible with the circulant structure) 
on a block band and on the comers, is necessarily block-circulant, i.e. the underlying process is 
stationarjj^ 

Because of the equivalence of reciprocal AR modeling and the underlying process covariance 



having an inverse with a banded structure, explained in Section III, we see that the Maximum 
Entropy principle leads in fact to (reciprocal) AR models. This makes contact with the ever- 
present problem in control an signal processing of (approximate) AR modeling from finite 
covariance data, whose solution dates back to the work of N. Levinson and P. Whittle. That 

*An alternative proof of this fact can be constructed based on the invariance properties of the entropy functional and its 
strict concavity. This has been recently accomplished (in a more general framework) in l4l . 
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AR modeling from finite covariance data is actually equivalent to a positive band extension 
problems for infinite Toeplitz matrices has been realized and studied in the past decades by 
Dym, Gohberg and co-workers, see e.g. [fT3ll . [[TtII as representative references of a very large 
literature. We should stress here that band extension problems for infinite Toeplitz matrices are 
invariably attacked and solved by factorization techniques, but circulant matrices do not fit in 
the "banded algebra" framework used in the literature. Also, one should note that the maximum 
entropy property is usually presented in the literature as a final embellishment of a solution 
which was already obtained by factorization techniques. Here, for the circulant band extension 
problem, factorization techniques do not work and the maximum entropy principle turns out to 
be the key to the solution of the problem. 

This fact, together with Dempster's observation ifTTl Proposition b], may be taken as a proof 
(although referred to a very specific case) of a very much quoted general principle that maximum 
entropy distributions are distributions achieving maximum simplicity of explanation of the data. 

Finally, we anticipate that the results of this section lead to an efficient iterative algorithm for 
the explicit solution of the MEP which is guaranteed to converge to a unique minimum. This 
solves the variational problem and hence the circulant band extension problem which subsumes 
maximum likelihood identification of reciprocal processes. This algorithm, which will not be 
described here for reasons of space limitations, compares very favorably with the best techniques 
available so far. 

VIIL Conclusions 

A new class of stationary reciprocal processes on a finite interval has been introduced which 
are the acausal analog of autoregressive (AR) processes on the integer line. Maximum likelihood 
identification of these AR-type reciprocal models is discussed. The computation of the estimates 
of the matrix parameters of the model turns out to be a particular instance of a Covariance 
selection problem of the kind studied by the statistician A.P Dempster in the early seventies. 
In matrix terminology, the covariance selection for stationary reciprocal models is equivalent to 
a special matrix band extension problem for block-circulant matrices. We have shown that this 
band extension problem can be solved by maximizing an entropy functional. 
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